What kind of corpus did I choose and why?

I have always been fascinated by musicals. More recently, I was introduced to operas and I recognized the same compelling drama I appreciate in musicals. Opera songs and musical songs both mainly serve to tell a story, but have very different styles. I wonder if opera and musical music share certain aspects, because they both have such a strong narrative function. That is why the main question I want to answer is: “What similarities between musical and opera songs can be found based on Spotify’s API?”  
The corpus that I use for my portfolio consists of 100 opera songs and 100 musical songs, collected from pre-existing playlists that are available on Spotify. I use Spotify’s Opera 100: Spotify Picks playlist as a basis for my opera playlist. This playlist consists of 97 tracks, so I have added 3 tracks manually, based on Spotify’s suggestions. For my musical playlist, I use a public playlist called BROADWAY MUSICALS, made by Hugo Torres. Out of all musical playlists I could find, I found this to be the most inclusive. Furthermore, I chose to focus on Broadway musicals (even though I am myself more familiar with musical movies) because they were written to be performed in front of a live audience, just like operas. The original playlist consists of 150 songs, so I manually removed 50 tracks. I chose to remove tracks from musicals which had more tracks in the playlist, in order to create a playlist with as much different musicals as possible. The majority of songs in the corpus are studio recordings, but there are some live versions in it as well.

Weaknesses of the corpus
Because adding music from every opera and musical would create a very big corpus, tracks from some operas and musicals are essentially missing (also because most operas and musicals have had a lot of productions with different artists/conductors/musicians). This means that my corpus does not cover the whole genre. Furthermore, Spotify’s pre-existing playlists generally include only the well-known (classical) operas and musicals, leaving out smaller productions.  

Expectations
In comparing opera tracks with musical tracks, I expect to find a big difference in tempo and danceability. Furthermore, I think opera songs are sadder than musical songs, which might be reflected in the valence. I am curious to find out if the energy and loudness differ between the groups, and I think the highest chance of finding similarities between the genres lies in these features.  

Check out the musical part and the opera part of my corpus.

Are musical songs really happier than opera songs? Comparing track-level features of Spotify’s API.


I started by visualizing the distribution of valence against energy for both musical and opera tracks. Furthermore, this plot shows the danceability (size) and loudness (color) of the songs in my corpus.

Musical tracks cover a wider range of valence and energy than opera tracks. I think this might be because musicals can be written in a variety of different styles/genres, whereas operas often have the same style. I was surprised by the fact that the graph shows so little difference between opera tracks, I would’ve expected at least a little more variety. This plot strongly suggests that almost all opera songs in my playlist are very sad.

Furthermore, as expected, musical songs are generally more danceable than opera songs. It is interesting that musical tracks seem to have a linear relationship: the more energy a track has, the higher the valence. This would mean that there are not many relaxed or angry musical songs.

Take a look at the clear outliers in both groups. Memory (from the musical Cats) is a particular sad musical song, while Stizzoso, mio stizzoso, voi fate il borioso (from the opera La Serva Padrona) is a particular happy opera song (and also turns out to be the most danceable opera song).

When you look at the loudness feature, you see a big difference between musical and opera songs: musical songs are much louder than opera songs. I can imagine that this is the case because musical songs are somewhat more hysterical sometimes, but I did not expect to see a difference this big.

Because the opera songs are not distributed in an even way, it is hard to see the data. That’s why I decided to take a closer look in the next tab.

Take a closer look at the opera playlist.


Take a look at a zoomed in version of the opera plot you saw in the previous tab. In this plot, another outlier stands out: Et maintenant je dois offrir (from the opera Les Huguenots) has the highest energy of all opera songs in the corpus. I manually chose different x and y limits for this plot, so the datapoints are somewhat clearer now. However, they are still very much clustered together in the low-Energy-low-Valence corner. This means that almost all opera songs in my corpus are sad.

When you look at Spotify’s explanation of energy, you find this:

“Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically,energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale”.

Given this definition, I think it is not surprising that opera songs are low in energy. However, I expected to find more songs with higher energy because opera songs can be very temperamental, which I thought would be reflected in the energy values. It could also very well be the case that these temperamental songs are just not represented well in my corpus.

As for the low valence values for opera songs: I think this can be explained by the fact that many operas are tragedies, so using sad songs in these stories makes total sense.

What’s so special about “Stizzoso, mio stizzoso, voi fate il borioso”? Check out its chromagram and compare it to that of a typical opera song.


Here, you see a chomagram of the song “Stizzoso, mio stizzoso, voi fate il borioso” (from the opera La Serva Padrona). The norm for the chroma vectors used here is euclidean.

What you can tell from this chromagram is that the song mainly jumps from E to A. Other than that, it is quite hard to tell what the chromagram represents, because it’s kind of all over the place.

In order to make the analyisis of this song more meaningful, I compared its chromagram to the chromagram of a very average song in my corpus: “Gluck, das mir verblieb” (from the opera Die Tote Stadt). You can find this chromagram in the next tab.

What you can see in this chromagram is that the two mainly used pitches in the song (F and A#) are much clearer. Maybe this means that “Stizzoso, mio stizzoso, voi fate il borioso” just has more different pitches in it.

When you listen to the two songs, it makes sense that Spotify classifies “Stizzoso, mio stizzoso, voi fate il borioso” to be much happier than “Gluck, das mir verblieb”. The former has lots of happy violins in it, while the latter is very melodramatic.

What’s the difference in tempo between opera and musical songs? Comparing low-level audio analysis features at the playlist level.


Here, the standard deviation and mean of the tempo of songs in the corpus. This graph includes the first 30 songs of each playlist, because adding audio analysis for every track is a slow operation. This selection of songs is representative for the corpus as a whole, because each playlist contains only one song from the same opera/musical and they were put together in a random order.

The graph shows a clear difference between opera and musical songs. We can conclude that the standard deviation of musical songs is generally smaller than that of opera songs. This means that, within the same track, opera songs tend to differ more in tempo (i.e., they have obvious slower and faster parts) than musical songs. Musical songs tend to hold on the the same tempo throughout the song. Interesting to note here is that the graph also shows a slight difference in duration for opera and musical songs. Opera songs tend to be longer, so maybe this gives more space to vary in tempo throughout a song.

Furthermore, this graph makes clear that musical songs tend to have a higher mean tempo than opera songs. This may be related to the fact that musical songs have a higher danceability according to Spotify’s API. There is also a much bigger variation in the mean tempo of musical songs compared to the mean tempo of opera songs, meaning that musical songs differ in tempo more than opera songs. In the interactive visualization (on the second tab) that I made, you could also find that musical songs cover a wider range of energy and valence than opera songs do. It could be the case that the range of mean tempo is related to this.

How does Spotify’s valence feature relate to the tempo of a song? Comparing tempograms of the musical songs with the highest and lowest valence values in my corpus.


Here, you see tempograms for Falling Slowly (Once) and Fame (Fame). When you look at my interactive plot, you can see that Falling Slowly has the lowest valence value in the musical playlist, while Fame has the highest. There’s also a substantial difference in energy values (0.124 for Falling Slowly vs. 0.73 for Fame). I was wondering whether these differences are reflected in the tempograms of the song. Consequently, this tab shows a little investigation into Spotify’s API rather than my corpus.

The short answer is: not really. Both graphs show a clear line at around 250 BPM, indicating that the mean tempi of the song are close together. This is reflected in my interactive plot, which shows that Spotify indicates a tempo of 128 BPM for Falling Slowly and a tempo of 131 BPM for Fame. This implies that the bright yellow lines in the tempograms actually are tempo harmonics.

The one noticeable difference between the tempograms is the fact that the one for Fame is much clearer than the one for Falling Slowly, which shows a kind of panther print. This could mean two things: either Spotify has more trouble with beat tracking in Falling Slowly, or there are more tempi changes in Falling Slowly. While listening to Falling Slowly, you can hear a small increase in tempo at around 50 seconds (also reflected in the tempogram), but other than that, I would say the tempo of the song doesn’t change that much. This leads me to conclude that Spotify has more trouble detecting the tempo of this song. This could be explained by the fact you can hear very clear beats throughout Fame, while these clear beats are missing in Falling Slowly.

To answer the main question of this tab: I can’t detect a clear relation between the valence values and tempograms of Falling Slowly and Fame. Apart from the fact that Fame seems to have a stronger beat, there is no big difference in tempi. Of course, it could be the case that these songs are just not good examples. It could also be the case that the strong beat does relate to valence or energy somehow, but I can’t draw a conclusion based on only 1 example.

Investigating my favourite musical song: West Side Story’s America (self-similarity matrices).


Here, you can compare the chroma-based and timbre-based SSMs for the song America. (West Side Story). The norm used is euclidean, the distance metric used is cosine, and the summary statistic used is root mean square. The Spotify segment used is “bars”.

A few things about these SSMs stand out to me. First, you can see that a very clear line appears around 50 seconds in the chroma SSM. This indicates a novelty in the song: at this time point, you can probably hear a unique pitch in the song. When you listen to the song, you can hear that this is indeed a turning point in the song. The time point seems to correspond with the transition of the intro into the first chorus. More specifically, the 50 second time mark precisely corresponds to the “I know you do!” exclamation. Interestingly enough, the timbre SSM does not really show the transition of the intro into the rest of the song.

In the timbre SSM, you can find a clear line right at the beginning of the song. This line indicates a unique appearance of timbre/instruments in the song. Indeed, the very beginning of the song (first 10 seconds) consists of some kind of percussion which is not strongly present in the rest of the song.

Secondly, you can see several diagonal lines parallel to the main diagonal from around 120 seconds until the end of the song in the chroma SSM. Diagonal lines indicate repetitions in the song. This pattern makes sense, because from this time mark onward the song basically only consists of repetitions. At 120 seconds, the main instrumental part of the song begins, and this pattern is repeated throughout the song.

Finally, the chroma SSM shows 3 visible distinct parts of the song. 0-50 seconds indicates the intro of the song. Then, there is a block between 50 and 120 seconds, which consists of the first verses and chorus with relatively few musical instruments. After 120 seconds, the third part starts and the energy of the song increases.

In the timbre SSM, you can see that the song can be divided into 2 main parts: 0-120 seconds and 120-300 seconds. At 120 seconds, the “big” instruments tune in and the energy is turned up (and the big dancing scene begins).

How well can classifiers distinguish between opera and musical songs?

For both classifiers, I used 10-fold cross-validation. As you can see in the heatmaps, both the kNN and the random forest classifier perform very well. As expected (from class), random forest performed slightly better, The precision of the musical class even reached 1, meaning that the model never falsely labeled an opera track as a musical track. Consequently, the recall of the opera class is 1, which means that all opera tracks were correctly labeled.

I tried to use the most important discriminators (loudness, timbre feature c01, acousticness, and energy) to improve the feature selection process, but it didn’t improve. I think this is the case because the models are already very good.

It is not surprising to me that acousticness and loudness are the most important discriminators. It makes sense that opera songs are more acoustic than musical songs, and after seeing the big differences in loudness in the second tab, this results does not come as a surprise. However, prior to the first visualization I would not have suspected that the loudness and energy of opera an musical songs serve as big discriminators.

I was surprised to find that the importance of tempo in discriminating between the genres is so low, considering that we have seen that the mean tempo of musical songs is quite higher than that of opera songs.

When you look at the visualization of the clusters, you understand why the classifiers work so well. There are two very distinct clusters with minimal overlap. The plot also explains why some musical songs get labeled as opera songs in the random forest model. It’s interesting to note here that the kNN model makes some wrongfully labels some opera songs as musical songs, although here the opera cluster seems quite clear. The outliers may play a role in this.

The high performance of the models looks suspicious to me. Because the corpus that I use is rather small, the external validity of this result is questionable: I am not sure whether you can generalize this clear distinction between opera and musical songs to a broader scale. The high performance is most likely due to the small corpus. It is however quite nice to see that Spotify’s features capture very clear distinctions between the two genres, although the main discriminators are different than what I expected.

Conclusion: are there any similarities between opera songs and musical songs?

After building my portfolio, I can conclude that there are no noticeable similarities between the opera and musical songs in my corpus based on Spotify’s API. Especially the fact that the kNN and random forest classifier performed so well tells me that Spotify can distinguish really well between the two genres.

I expected to find similarities in loudness and energy based on my initial intuition, but they both turned out to be quite important discriminators. Given that I do not have any musical background (other than enjoying listening to music), it is however no surprise that my intuition was flawed.

I presented contradictory findings concerning the tempo feature: on the one hand, it seemed like the mean tempo of opera songs is quite lower than that of musical songs. On the other hand, the list of discriminators based on the random forest classifier showed that tempo was not an important discriminator. Therefore, I cannot draw an evident conclusion.

Furthermore, I have presented a very small investigation into Spotify’s valence feature, independent of the rest of my portfolio. This investigation showed that there is no clear relation between valence and tempo of songs. However, it is way to rigorous to draw a general conclusion based on only one example.

This leads me to the more general point that my corpus is very limited and therefore does not represent all opera and musical music. Therefore, I cannot with certainty conclude that there are no similarities between opera and musical songs in general. However, I am quite confident that this portfolio shows that there are way more differences than similarities between the two. I guess that both opera and musical songs serve to tell a story, but they do so in a very different manner.